Record Linkage I: Evaluation of Commercially Available Record Linkage Software for Use in NASS

نویسنده

  • Charles Day
چکیده

Record linkage is an important technique in NASS for minimizing the presence of duplicate names on its list sampling frame of farm operators and agribusinesses. In the late 1970' s, NASS developed an automated record linkage system which runs on an IBM mainframe for this purpose. With changes in technology, the need has arisen for portability between platforms, integration with client/server technology, and interactive operation. Also, NASS desires to reduce resource expenditures on record linkage while maintaining the quality of the process. The growing availability of commercial record linkage solutions has made unnecessary the development of a new record linkage system or an expensive and difficult rewrite of the old system. This report evaluates six commercially available record linkage software packages for their suitability for NASS's purposes. The report starts with a brief discussion of record linkage in NASS, then discusses the statistical theory behind the most popular probabilistic record linkage solution, that of Fellegi and Sunter. Next, the report discusses the requirements for a NASS record linkage system. Detailed reviews of the six software packages follow. Except for the review of AUTOMA TCH, which NASS has tested extensively, these reviews are based on information provided by the software manufacturers. The report concludes that, for NASS's purposes, AUTOMA TCH is the best choice. The report ends with a glossary of record linkage terminology and a checklist for the evaluation of record linkage software packages.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Probabilistic Linkage of Persian Record with Missing Data

Extended Abstract. When the comprehensive information about a topic is scattered among two or more data sets, using only one of those data sets would lead to information loss available in other data sets. Hence, it is necessary to integrate scattered information to a comprehensive unique data set. On the other hand, sometimes we are interested in recognition of duplications in a data set. The i...

متن کامل

Comparison of probabilistic and deterministic record linkage in the development of a statewide trauma registry.

We have been working to develop a statewide injury surveillance system using not only hospital-based trauma registries but also other sources of data (including ambulance run reports, hospital discharge abstracts, and death certificates). For this purpose, a commercially available probabilistic matching program was compared to the deterministic program described previously. Using the same data ...

متن کامل

Febrl – A Freely Available Record Linkage System with a Graphical User Interface

Record or data linkage is an important enabling technology in the health sector, as linked data is a costeffective resource that can help to improve research into health policies, detect adverse drug reactions, reduce costs, and uncover fraud within the health system. Significant advances, mostly originating from data mining and machine learning, have been made in recent years in many areas of ...

متن کامل

An Empirical Comparison of Approaches to Approximate String Matching in Private Record Linkage

Due to the frequency of spelling and typographical errors in practical applications, record linkage algorithms have to use string similarity functions. In many legal contexts, identifiers such as names have to be encrypted before a record linkage can be attempted. Therefore, algorithms for computing string similarity functions with encrypted identifiers are essential for approximating string ma...

متن کامل

Scaling Record Linkage to Non-uniform Distributed Class Sizes

Record linkage is a central task when information from different sources is integrated. Record linkage models use so-called blockers for reducing the search space by discarding obviously different record pairs. In practice, important problems have Zipf distributed class sizes with some large classes where blocking is not applicable any more. Therefore we propose two novel meta algorithms for sc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007